feat: parallelize container monitor dependency requests by bgardiner · Pull Request #6757 · snyk/cli

bgardiner · 2026-04-29T14:32:30Z

What does this PR do?

Rewrites monitorDependencies in src/lib/ecosystems/monitor.ts to fan out per-ScanResult PUT /monitor-dependencies calls in parallel via pMap, bounded by the getRequestConcurrency() helper introduced in #6756 (default 5, user-overridable via SNYK_REQUEST_CONCURRENCY).

Effective change for users: container monitor goes from sequential (1 in-flight request) to 5 in-flight by default, with the same env-var override surface as snyk container test.

Per-ScanResult work is extracted into a small monitorOneScanResult helper for testability.
Outer loop over paths still runs sequentially (one path per monitor invocation is the common case anyway).
Inner loop is now pMap(scanResults, ..., { concurrency }).

This PR is based on feat/snyk-request-concurrency (#6756) so its diff shows only the monitor-flow changes.

Why?

snyk container monitor previously ran fully sequentially — one full RTT per ScanResult — because of a nested for...of await loop. For large images with many ScanResults (e.g. fat-JAR-heavy images that produce one ScanResult per directory of JARs in snyk-docker-plugin's groupJarFingerprintsByPath), this is the dominant wall-clock cost.

Benchmark — `quay.io/wildfly/wildfly:34.0.1.Final-jdk21` (512 ScanResults)

Sequential baseline measured against the released CLI (snyk v1.1303.2, prior code path). PR configurations measured against the locally-built PR binary, varying only SNYK_REQUEST_CONCURRENCY. Re-monitoring updates existing project snapshots rather than creating duplicates.

Configuration	Wall-clock	vs sequential
sequential (`snyk` v1.1303.2, prior code)	481 s (8m 1s)	1.00×
default (`c=5`, this PR)	104 s	4.6× faster
override (`c=10`)	54 s	8.9× faster
override (`c=20`)	29 s	16.6× faster

Each PR run successfully updated all 512 snapshots. The sequential run produced 510 snapshots; possibly minor version drift between v1.1303.2 and the PR binary's bundled snyk-docker-plugin. No errors logged; doesn't affect the timing comparison.

The default-of-5 row is the new behavior users will see without any opt-in.

Error semantics preserved

401 → still throws AuthFailedError (terminates the run).
Other 4xx → still throws MonitorError (terminates the run via pMap's default fail-fast on rejection).
5xx and other non-4xx errors are accumulated per-ScanResult into the errors array, matching prior continue-on-error behavior.

Result ordering preserved

pMap returns results in input order regardless of completion order, so the formatted output remains deterministic.

Where should the reviewer start?

src/lib/ecosystems/monitor.ts — monitorDependencies rewrite + new monitorOneScanResult.
test/jest/unit/ecosystems-monitor-docker.spec.ts — 5 new tests in a parallelization of monitor-dependencies requests describe:
- Concurrency cap honored at default 5.
- SNYK_INTERNAL_REQUEST_CONCURRENCY=3 override respected (the internal env var the TS code reads; the user-facing SNYK_REQUEST_CONCURRENCY is plumbed through Go in feat: introduce SNYK_REQUEST_CONCURRENCY for dependency request parallelism #6756).
- Result ordering preserved when responses arrive out of order.
- 4xx fail-fast (throws MonitorError).
- 5xx accumulation (continues, returns errors).

How should this be manually tested?

Run unit tests:

npx jest --selectProjects coreCli --testPathPattern 'test/jest/unit/ecosystems-monitor-docker'

Optional smoke test against a throwaway org with a multi-project image — use the Go-wrapped binary so SNYK_REQUEST_CONCURRENCY is plumbed through:

./binary-releases/snyk-... container monitor <image> --org=<throwaway-org>
SNYK_REQUEST_CONCURRENCY=20 ./binary-releases/snyk-... container monitor <image> --org=<throwaway-org>

Risk assessment

Low. The change is scoped to a single function. Error semantics are preserved (verified by tests). Concurrency is bounded and configurable. pMap is already used by runTest so it's a familiar dependency in this codebase.

Background

Depends on / based on #6756, which:

introduces the getRequestConcurrency helper and clamps,
pipes the user-facing SNYK_REQUEST_CONCURRENCY env var through the Go-side GAF configuration so the legacy CLI receives the resolved value via the internal SNYK_INTERNAL_REQUEST_CONCURRENCY env var.

Supersedes #6748:

That PR used unbounded Promise.all, which on a 500-ScanResult image would fire 500+ concurrent PUTs — a real risk of 429s, socket exhaustion, or thundering-herd backend load.
That PR added a "base scan first, then Promise.all the rest" hybrid pattern; once results are kept in input order via pMap, the special case isn't load-bearing.

snyk-io · 2026-04-29T14:32:56Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues
✅	Code Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

bgardiner · 2026-04-30T12:56:13Z

Benchmark — `quay.io/wildfly/wildfly:34.0.1.Final-jdk21` (512 ScanResults)

Sequential baseline measured against the released CLI in PATH (snyk v1.1303.2, prior code path). PR-B configurations measured against the locally-built PR binary, varying only SNYK_REQUEST_CONCURRENCY. Re-monitoring updates existing project snapshots rather than creating duplicates.

Configuration	Wall-clock	vs sequential
sequential (`snyk` v1.1303.2, prior code)	481 s (8m 1s)	1.00×
`SNYK_REQUEST_CONCURRENCY=5` (PR B)	104 s	4.6× faster
default `c=10` (PR B)	54 s	8.9× faster
`SNYK_REQUEST_CONCURRENCY=20` (PR B)	29 s	16.6× faster

Each PR-B run successfully updated all 512 snapshots. The sequential run produced 510 snapshots; possibly minor version drift between v1.1303.2 and the PR binary's bundled snyk-docker-plugin. No errors logged; doesn't affect the timing comparison.

bdemeo12 · 2026-04-30T14:31:55Z

+const MAX_REQUEST_CONCURRENCY = 50;
+
+/**
+ * Returns the maximum number of in-flight Snyk dependency-test or


Do you know if these APIs rate limit? Could we run into this if a customer (like Walmart) sets this to 50?

Good question. I am worried about that. I'm not sure how the rate limiting occurs (e.g., dest endpoint, source IP, etc.). I need to look into this. I'm hoping going from 5->10 as a default is relatively safe, but I don't think we can do more without some testing.

Rate-limit investigation summary

I dug into the api-gateway config (github.com/snyk/api-gateway) to confirm the bumped concurrency won't push CLI users over rate limits. Here's what I found.

Rate-limit policy for our endpoints

POST /v1/test-dependencies, POST /v1/test-dep-graph, and PUT /v1/monitor-dependencies all match the default /v1/test* and /v1/monitor* prefix routes in helm/templates/route-tables/api-v1.yaml. None of them have a custom envoyMetadata.snyk.ratelimiter.keys.bucket override, so they fall into the default high bucket of the api-v1 rate-limit config.

Keyed by principal_id — per-authenticated-user (per-token), per docs/rate-limiting.md. Not per-IP, not per-cluster.

Default bucket: high. Limits per token:

Window Limit

Per second 200 req/s

Per minute 2,000 req/min (≈ 33/s sustained)

Per hour 60,000 req/hour (≈ 16.7/s sustained)

All three apply simultaneously (burst, 1-min, 1-hour ceilings).

The only adjacent exception is lowerDepGraphRateLimit.enabled (polaris-prod-pc-pc01-1 only), which drops /v1/monitor/dep-graph (with slash, not our endpoint) to extra_low. Doesn't affect us.

Headroom analysis for this PR

The default change is 5 → 10 in-flight requests per CLI invocation. The proposed env override (SNYK_REQUEST_CONCURRENCY) caps at 50.

Scenario Observed peak rate Limit Headroom

Wildfly container test, c=10 ~13 req/s sustained 200 req/s burst 15×

Wildfly container test, c=20 ~20 req/s sustained 200 req/s burst 10×

Wildfly container test, c=50 (max override) ~50 req/s burst 200 req/s burst 4×

For a single invocation, no setting in the supported range gets close to the per-second ceiling.

Other context from registry-side telemetry

For completeness, I checked Datadog for actual 429s on the registry pod:

Endpoint 30-day requests 30-day 429s 429 rate

POST /test-dep-graph 9.0 M 24 0.00027%

POST /test-dependencies 1.65 M 22 0.0013%

PUT /monitor-dependencies 1.04 M 0 0%

Caveat: the Envoy gateway returns 429s to clients before the request reaches the registry pod, so registry-side 429 counts under-represent total rejections. At the cluster level, ~1% of total traffic to static-default-registry-web-30100_api-gateway (~807M requests over 30 days) is rejected — but that's all paths combined, and per-endpoint attribution isn't available at the metric layer.

Verdict

The bump from 5 → 10 (and the 50 cap on the env override) is well within the per-token rate limit policy for these endpoints. Even at the highest supported override, a single CLI burst stays at ≤25% of the per-second ceiling and doesn't change per-minute/per-hour quota consumption.

Recommended follow-up (optional, separate PR)

The current retry logic in run-test.ts:265-296 retries on 500/502/503 but not 429. If a heavy user (CI fleet sharing a token) ever hits the per-hour bucket — which is plausible regardless of this PR — they get a hard failure today.

bdemeo12 · 2026-04-30T15:03:28Z

 import {
  RETRY_ATTEMPTS,
  RETRY_DELAY,
+  getRequestConcurrency,


Do we know why concurrency was not enabled for monitor-deps ?

I assume just an oversight. I don't know

bdemeo12

LGTM! Just a few questions!

Rewrite monitorDependencies in src/lib/ecosystems/monitor.ts to fan out per-ScanResult /monitor-dependencies PUTs in parallel via pMap, bounded by the SNYK_REQUEST_CONCURRENCY limit (default 5). Per-ScanResult work is extracted into monitorOneScanResult for testability and clarity. Container images that produce many ScanResults (e.g. one per directory of JARs in fat-JAR-heavy images) previously incurred one full RTT per scan result, since the prior implementation used a nested for-loop with await. With bounded parallelism this collapses to ~ceil(N / concurrency) sequential batches, materially reducing wall-clock for large images. Error semantics are preserved: - 401 still throws AuthFailedError (terminates the run). - Other 4xx still throws MonitorError (terminates via pMap fail-fast). - 5xx and other non-4xx errors are accumulated per-ScanResult into the errors array, matching the prior continue-on-error behavior. Result order is preserved by pMap based on input order, so output remains deterministic regardless of completion order. Tests cover concurrency cap (default 5), env override via the internal SNYK_INTERNAL_REQUEST_CONCURRENCY contract that the wrapping Go CLI forwards, ordering preservation, 4xx fail-fast, and 5xx accumulation.

snyk-pr-review-bot · 2026-05-14T17:30:46Z

PR Reviewer Guide 🔍

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Import Resolution Risk 🟡 [minor] The PR uses `import * as pMap from 'p-map'`. For `p-map` version 4.0.0 (as specified in `package.json`), the package is often exported as a default export in CommonJS environments or requires specific flag handling in ESM. While other parts of the monorepo (like `snyk-fix`) use this syntax, if the build target/environment differs, this can result in `pMap` being an object containing the function rather than the function itself, leading to 'pMap is not a function' at runtime. import * as pMap from 'p-map';
📚 Repository Context Analyzed This review considered 8 relevant code sections from 7 files (average relevance: 0.84) src/cli/commands/fix/index.ts (lines 1-204) src/cli/main.ts (lines 1-236) packages/snyk-fix/src/plugins/python/handlers/pip-requirements/index.ts (lines 1-256) package.json (lines 1-224) packages/snyk-fix/src/index.ts (lines 1-137) packages/snyk-fix/src/plugins/python/index.ts (lines 1-95) src/lib/ecosystems/monitor.ts (2 segments, lines 1-230)

bgardiner mentioned this pull request Apr 29, 2026

feat: parallelize container monitor dependency requests #6748

Closed

9 tasks

bgardiner mentioned this pull request Apr 29, 2026

feat: introduce SNYK_REQUEST_CONCURRENCY for dependency request parallelism #6756

Merged

bdemeo12 reviewed Apr 30, 2026

View reviewed changes

bgardiner marked this pull request as ready for review May 1, 2026 16:30

bgardiner requested review from a team as code owners May 1, 2026 16:30